Policy Gradient Coagent Networks
نویسنده
چکیده
We present a novel class of actor-critic algorithms for actors consisting of sets of interacting modules. We present, analyze theoretically, and empirically evaluate an update rule for each module, which requires only local information: the module’s input, output, and the TD error broadcast by a critic. Such updates are necessary when computation of compatible features becomes prohibitively difficult and are also desirable to increase the biological plausibility of reinforcement learning methods.
منابع مشابه
Recurrent Predictive State Policy Networks
We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to act...
متن کاملLagrange policy gradient
Most algorithms for reinforcement learning work by estimating action-value functions. Here we present a method that uses Lagrange multipliers, the costate equation, and multilayer neural networks to compute policy gradients. We show that this method can find solutions to time-optimal control problems, driving linear mechanical systems quickly to a target configuration. On these tasks its perfor...
متن کاملDynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning
This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (p...
متن کاملAdaptive Route Selection Policy Based on Back Propagation Neural Networks
One of the key issues in the study of multiple route protocols in mobile ad hoc networks (MANETs) is how to select routes to the packet transmission destination. There are currently two route selection methods: primary routing policy and load-balancing policy. Many ad hoc routing protocols are based on primary (fastest or shortest but busiest) routing policy from the self-standpoint of traffic ...
متن کاملParameter-exploring policy gradients
We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than obtained by regular policy gradient methods. We show that for several complex control tasks, including robust standing with a humanoid robot, this method ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011